INVESTIGATION 



Insights into the Evolution of Cotton Diploids and 
Polyploids from Whole-Genome Re-sequencing 

Justin T. Page,* Mark D. Huynh/ Zach S. Liechty/ Kara Grupp,* David Stelly, § Amanda M. Hulse, § 

Hamid Ashrafi,** Allen Van Deynze,** Jonathan F. Wendel,* and Joshua A. Udall* + ' 1 

*Biology Department, Brigham Young University, Provo, Utah 84602, + Plant and Wildlife Science Department, Brigham 
Young University, Provo, Utah 84602, ^Department of Ecology, Evolution, and Organismal Biology, Iowa State University, 
Ames, Iowa 5001 1, ^Department of Soil and Crop Sciences, Texas AgriLife Research, College Station, Texas 77843, 
and **Seed Biotechnology Center, University of California-Davis, Davis, California 95616 



ABSTRACT Understanding the composition, evolution, and function of the Gossypium hirsutum (cotton) 
genome is complicated by the joint presence of two genomes in its nucleus (A T and D T genomes). These 
two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral 
allopolyploidization. To better understand the allopolyploid genome, we re-sequenced the genomes of 
extant diploid relatives that contain the A-i {Gossypium herbaceum), A 2 {Gossypium arboreum), or D 5 
{Gossypium raimondii) genomes. We conducted a comparative analysis using deep re-sequencing of mul- 
tiple accessions of each diploid species and identified 24 million SNPs between the A-diploid and D-diploid 
genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the 
A-genomes and D-genomes at all detected polymorphic loci. This index is widely applicable for read 
mapping efforts of other diploid and allopolyploid Gossypium accessions. Further analysis also revealed 
locations of putative duplications and deletions in the A-genome relative to the D-genome reference 
sequence. The approximately 25,400 deleted regions included more than 50% deletion of 978 genes, 
including many involved with starch synthesis. In the polyploid genome, we also detected 1 ,472 conversion 
events between homoeologous chromosomes, including events that overlapped 113 genes. Continued 
characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agro- 
nomic production of cotton. 
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We aligned re-sequencing reads to the existing D-genome sequence 
and discovered novel changes between the A-genomes and D-genomes 
in both diploid and polyploid plants. We identified single base differ- 
ences throughout the genome between the diploid genomes and dis- 
covered that 978 genes of the D-genome reference sequence are 
consistently deleted in the A-genome. We discovered that approxi- 
mately 900 Kbp of sequence in the polyploid genome have been 
converted from one genome to another in separate conversion events 
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scattered across the genome. These discoveries help us better under- 
stand the dynamic nature of polyploid genomes and provide many 
avenues for further genomic research in cotton. 

The genus Gossypium (cotton) includes approximately 45 diploid 
species that are divided into eight monophyletic groups, each designated 
by a single letter ("A" through "G" and "K," hereafter referred to as 
genome groups) (Wendel et al 2012). Ancient hybridization between A 
and D diploids resulted in a new allopolyploid (AD) lineage in the New 
World approximately 1-2 million years ago (Wendel 1989). Two of the 
descendant allopolyploid species — Gossypium hirsutum (AD X ) and Gos- 
sypium barbadense (AD 2 ) — as well as two African- Asian A diploids — 
Gossypium herbaceum (A x ) and Gossypium arboreum (A 2 ) — were each 
independently domesticated for their long, spinnable, epidermal seed 
trichomes. These four species collectively provide the world's cotton 
fiber production, with more than 90% of this total being attributable 
to the cultivation of "upland cotton," G. hirsutum (Wendel and Cronn 
2003). Understanding the cotton genome is important for facilitating 
advances in crop variety development and utilization. In addition, 
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insights into polyploid evolution in cotton may further our under- 
standing of other polyploid crops. 

Molecular studies and comparisons between diploid cotton species 
have revealed a genus with extraordinary genome dynamics. For 
example, there is a nearly three-fold variation in genome sizes among 
diploids (Wendel and Cronn 2003; Wendel et al. 2012), with the 
A-genome (1.7 Gbp) being nearly twice the size of the D-genome 
(0.9 Gbp), largely because of the proliferation of GORGE3 gypsy-like 
retrotransposons (Hawkins et al. 2006). Despite this size difference, 
comparative mapping studies have indicated that gene order and co- 
linearity have been largely conserved between the diploid A-genomes 
and D-genomes (Brubaker et al. 1999), with the corollary that most 
genome size diversity reflects variation in the rates of proliferation and 
deletion of repetitive elements (Hawkins et al 2009; Grover and 
Wendel 2010). Molecular phylogenetic and dating studies indicate 
that the A-genomes and D-genomes diverged approximately 5-10 
million years ago. The F-genome of Gossypium longicalyx diverged 
from the A-genome after the A-D divergence, making it a suitable 
outgroup for a comparison of the A-genome diploids. 

The respective A and D diploid genomes are closely related to the 
two homoeologous genomes in allopolyploid cotton, A T and D T ("T" 
denotes tetraploid), because allopolyploidization is thought to have 
occurred during the mid-Pleistocene era, or 1-2 million years ago 
(Wendel 1989). Consequently, genome differences between diploids 
A 2 and D 5 serve as a fair approximation of the differences between A T 
and D T tetraploid genomes (Udall 2006; Flagel et al 2012). Thus, the 
existence of models of the diploid progenitors of allopolyploid cotton 
provide powerful reference points for inference of homoeology (e.g., of 
genes, transcripts, RNA-seq reads) in allopolyploid cotton. The recent 
publication of the genome sequence of the D-genome diploid (Gos- 
sypium raimondii; Paterson et al. 2012) allows for the development of 
new analytical and comparative approaches for the genomics of both 
diploid and polyploid cotton. For example, a tool was recently created 
to assign the sequence reads of allopolyploid cotton to their respective 
genome after mapping reads from A-diploids and AD-polyploids to 
the G. raimondii (D 5 ) reference sequence (Page et al. 2013). 

Combined with the rapid increase in available sequence data, these 
new genomic approaches may facilitate molecular and traditional 
improvement efforts of cotton. For example, analysis of reads from 
diploids mapped to a single genome reference provides a straightfor- 
ward method to identify single nucleotide polymorphisms (SNPs) 
between and within genomes, because each alignment of reads has 
identical relative positions (Page et al. 2013). In considering the rela- 
tionships among sequences from A-genome, D-genome, and AD- 
genome cotton species, it is useful to distinguish between two classes 
of SNPs. Briefly, homoeo-SNPs are fixed differences that distinguish 
(and hence diagnose) the A-genomes and D-genomes. Allele-SNPs, 
however, are traditional segregating polymorphisms within a single 
genome, between the two alleles of an individual accession (i.e., het- 
erozygosity) or between the corresponding homozygous alleles of dif- 
ferent accessions. Allele-SNPs are those historically used by breeders 
to improve cotton cultivars in marker-assisted or genomic selection 
methods. Homoeo-SNPs add another layer to practical utility of allele- 
SNPs in that they provide a genomic feature to distinguish between 
duplicate gene copies. Homoeo-SNPs are also useful in an evolution- 
ary context because their analysis offers insights into the molecular 
evolutionary properties of allopolyploid cotton and, more generally, 
allopolyploid genomes. 

To better understand both diploid and allopolyploid cotton 
genomes, we performed deep whole-genome re-sequencing of several 
diploid accessions of both A-genome and D-genome diploids. Our 



first objective was to determine all of the homoeo-SNPs between the 
A-genomes and D-genomes. Using reads from these diploids and 
publicly available reads from diploid and allopolyploid cottons, we 
compiled a database of SNPs between the various genomes studied. 
Our second objective was to describe genome evolution between the 
genomes that could be characterized by read coverage. We examined 
loci that are either duplicated or deleted in the A-genome species, 
based on coverage of A-genome reads mapped to the D 5 -genome 
reference. Where those duplications or deletions overlap with genes, 
they may provide insight into the evolutionary basis for the pheno- 
typic differences among diploids, including the production of spin- 
nable fiber in A-genome diploid species. Our third objective was to 
document the extent of genome interaction based on sequence data in 
the polyploid (i.e., conversion events). A robust description of con- 
version events throughout the cotton genomes will serve as a bioinfor- 
matic aide to future genomic analyses of allopolyploid cotton. 

MATERIALS AND METHODS 
Plant material 

Plant material was grown and harvested from greenhouses at Brigham 
Young University (D 5 -2, D 5 -31, Ai-155, A 2 -34, A 2 -1011), Iowa State 
University (D 5 -4, D 5 -53, A 2 -4, Ai-73), and Texas A&M University 
(A 2 -255). DNA was extracted from four accessions of G. raimondii 
(D 5 -2, D 5 -4, D 5 -31, D 5 -53), two accessions of G. herbaceum (Ax-73, 
Ai-155), and four accessions of G. arboreum (A 2 -4, A 2 -34, A 2 -255, 
A 2 -1011) using a Qiagen DNeasy plant kit. 

Acquisition of DNA sequence 

After shearing DNA with a Covaris instrument at the Huntsman 
Cancer Institute (Salt Lake City, UT), DNA libraries were prepared 
with the Illumina TruSeq V3 kit and sequenced by Beijing Genome 
Institute (BGI, Sacramento, CA), producing 100-bp paired-end reads. 
We assumed that the Illumina library construction process would 
perform equally well on high-quality DNA of the A-genomes and 
D-genomes. Approximately 40-times the genomic coverage was 
obtained for each library (Table 1). Reads from the diploids have been 
deposited in the NCBI Sequence Read Archive (SRA) under the fol- 
lowing entries: PRJNA202235, PRJNA202236, and PRJNA202239 for 
G. arboreum, G. herbaceum, and G. raimondii, respectively. Additional 
genomic sequence reads for G. longicalyx (Fi-1; SRR617255), G. her- 
baceum (Ai-97; SRR617256, SRR617284, SRR617704), and G. hirsu- 
tism cv. Maxxa (SRR6 17482) were obtained from the Sequence Read 
Archive. All reads were trimmed for quality with Sickle using 
a minimum phred quality threshold of 20 (https://github.com/ 
najoshi/sickle). 

Homoeo-SNP index 

An index of homoeo-SNPs between the A-genomes and D-genomes 
was produced by comparing sequence data from nine Gossypium 
diploids (Ai-97, A r 155, A 2 -34, A 2 -255, A 2 -1011 vs. D 5 -2, D 5 -4, 



Table 1 Transitions and transversions in the homoeo-SNP index 
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Rows = A allele. Columns = D allele. There was an overall transition/transversion 
ratio of -1.92 and GC fractions of 45.4% (A genome) and 45.1% (D genome). 
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D 5 -31, D 5 -53). First, all reads were mapped with GSNAP (Wu and 
Nacu 2010) using the options "-nl-Q" (requiring unique best map- 
ping for each read) to the 13 chromosomes of the D 5 reference se- 
quence (Paterson et al 2012). Second, alignment files were processed 
with SAMtools to produce sorted BAM files (Li et al 2009). Third, we 
used InterSnp, a custom code built on the BAMtools API (https:// 
github.com/pezmaster31/bamtools) and available as part of the Bam- 
Bam package (http://udall-lab.byu.edu/Research/Software/BamBam. 
aspx) to call SNPs with at least 10-times coverage and a minimum 
minor allele frequency of 40%. Because the alignments from diploids 
used the same reference genome, homologous loci in the A-genomes 
and D-genomes were readily compared to identify SNPs between 
genomes (homoeo-SNPs). Homoeo-SNPs were called at a locus posi- 
tion of the D-genome reference when all diploid genomes with cov- 
erage at that particular locus were homozygous, all A-genome diploids 
had the same base, and all D-genome diploids had the same base, 
different from the A-genome diploids. Finally, loci with identified 
homoeo-SNPs were tabulated into a text file that was converted into 
a homoeo-SNP index for use by GSNAP and PolyCat. 

SNP identification and diversity analysis 

Using the homoeo-SNP index, we again mapped the sequence reads 
from the nine diploids, this time using the SNP-tolerant mapping 
("-v" option) of GSNAP. We also mapped reads for three additional 
diploids (Fi-1, A x -73, A 2 -4) and one allopolyploid (G. hirsutum cv. 
Maxxa). GSNAP and SAMtools were otherwise used as noted. Reads 
from the tetraploid Maxxa were assigned to the A T -genomes and D T - 
genomes using PolyCat (Page et al 2013). All SNPs (homoeo-SNPs 
and allele-SNPs) were called by InterSnp between the 13 resulting 
BAM files, one for each A or D diploid, one for the A T -genome of 
Maxxa, and one for the D T -genome of Maxxa. The number of het- 
erozygous loci in each individual was summarized after filtering loci 
within conserved duplications. We constructed a neighbor-joining 
tree for the various diploid accessions, as well as the A T -genomes 
and D T -genomes, using the PHYLIP (Felsenstein 1989) program 
"neighbor" and default settings. The distance matrix consisted of 
the percentage of aligned sites that differed in pairwise comparisons. 

We used homoeo-SNPs between the diploids to generate a "pseudo- 
A" genome, with the A alleles substituted into the D 5 reference. We did 
the same with the Maxxa homoeo-SNPs to make "pseudo-A T " and 
"pseudo-D T " genomes. Although these pseudo-genomes did not have 
indels or structural variations that are present in the actual A, A T , and 
D T genomes, the majority of gene sequences were conserved (Flagel 
et al 2012; Paterson et al 2012). Thus, these pseudo-genomes served to 
characterize the location of allele-SNPs within genes and other con- 
served noncoding sequences, and having each genome on the same 
"scale" greatly simplifies genome comparisons. 

Duplications and deletions 

We detected putative duplications (relative to the D 5 reference ge- 
nome) in the other diploids using MACS (with default settings), 
a commonly used tool for ChlP-seq analysis (Zhang et al 2008). It 
empirically models peaks in coverage of ChlP-seq reads using a dy- 
namic Poisson distribution, thereby estimating the location of a DNA 
binding molecule. Here, we used MACS to call coverage peaks within 
WGS reads from the A-genome and F-genome diploids after align- 
ment to the D-genome reference. Assuming the libraries from both 
A-genomes and D- genomes would be equally biased, reads from D 5 - 
53 served as a control sample, estimating the expected coverage pat- 
tern. Peaks in A-genome coverage relative to D 5 -53 represent putative 



duplicated sites that were sampled at a higher frequency during se- 
quencing. To filter out false-positives, we also called coverage peaks in 
D 5 -2, D 5 -4, and D 5 -31, relative to D 5 -53. We used bedtools (Quinlan 
and Hall 2010) to compare peaks among and between the datasets and 
to identify their position relative to gene annotations in the D 5 version 
2.1 (Paterson et al 2012). 

Similarly, putative deletions in the test diploids were called if the 
reference sequence D 5 -53 had 20-times or higher coverage at both 
ends, as well as at an additional point at least 200 bp from either end 
of a region of at least 1000 bp, and if the test diploid had near-zero 
coverage (<3-times) at every point in that block. This detection was 
performed by Gapfall, part of the BamBam package (http://udall-lab. 
byu.edu). Blast2Go was used for an enrichment analysis (using the 
Fisher exact test) on genes duplicated or deleted in the A diploids 
(Conesa et al 2005). Default B2G parameters were used. 

Polyploid conversion events 

We used two methods to identify possible nonreciprocal homoeolo- 
gous or "gene conversion" events between the A T -genomes and D T - 
genomes of G. hirsutum cv. Maxxa. We first identified individual 
converted loci based on homoeo-SNPs, where reads from the A T - 
genome carried the D-genome nucleotide, or vice versa. The second 
method used duplications and deletions to identify regions of conver- 
sion, but for deletion detection using 15-times the minimum coverage 
for the duplicated genome and less than four-times the coverage for 
the "deleted" genome. If a region spanning at least 1 Kbp was "dupli- 
cated" in the A T - genome relative to the A diploids and "deleted" in 
the D T -genome relative to the D diploids, then an A T -biased conver- 
sion event was inferred. Similarly, an A T - genome deletion and D T - 
genome duplication suggested a D T -biased conversion. These analyses 
of the polyploid genome were limited to regions that were present in 
both diploid genomes (A and D) because homoeo-SNPs could only be 
predicted in such regions. 

RESULTS 

Intergenomic SNPs 

Similar to previous work on the detection and frequency of SNPs in 
genie regions (Page et al 2013), we produced a robust index of 
23,859,893 homoeo-SNPs between the genomes of diploid A-genome 
and D-genome cotton. These SNPs covered the genome of the 
D-genome reference sequence at a density of one SNP per 32.3 bases 
(Figure 1). This total number of SNPs is a dramatic increase from 
the number previously reported with the D-genome sequence (Paterson 
et al 2012) and in genie sequences (Page et al 2013). The index had 
a transition/transversion ratio of 1.92 (Table 1), similar to the Maize 
HapMap2 (Chia et al 2012). This genome- wide SNP analysis con- 
firmed our speculation that the previous ratio was downwardly biased 
in our gene-focused index. Across polymorphic nucleotide positions, 
there was not a significant difference between the GC biases of the 
A-genomes and D-genomes (45.4% and 45.1%, respectively). How- 
ever, these values were higher than the genome-wide GC content, 
suggesting an increased likelihood for SNPs at G or C nucleotides, 
possibly because of the high frequency of C — > T mutations caused by 
de-amination of cytosines. 

The genome-wide SNP index (SNP index 2.0) was based on 
comparisons of deep sequence coverage between multiple diploid 
A-genomes and D-genomes, so we anticipated it would be more 
robust and widely applicable for read mapping efforts of other 
diploids and allopolyploids than the previous index. The improved 
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Figure 1 Plot of genes, homoeo-SNPs, duplications, deletions, and 
conversion events in the A-genomes, relative to the D 5 reference se- 
quence, produced by Circos (Krzywinski et al. 2009). Considering the 
concentric circles from the outside inward, the outermost (and first) 
green circle indicates the location of annotated genes. The next circle 
(red) is a histogram of the number of homoeo-SNPs in a 1-Mbp win- 
dow throughout the genome. The next two red (high-frequency) to 
yellow (low-frequency) circles are heat maps showing the location of 
duplications in the A-i and A 2 genomes as compared to the D 5 ge- 
nome (A 2 interior). The next two blue (high-frequency) to yellow (low- 
frequency) circles are heat maps showing the location of deletions in 
the A<\ and A 2 genomes as compared to the D 5 genome (A 2 interior). 
The final two circles show conversion events in the tetraploid G. hirsu- 
tum cv. Maxxa. The first circle shows conversion of loci to the A nu- 
cleotide on a red-to-yellow scale, whereas the innermost (and last) 
circle shows conversion of loci to the D nucleotide on a blue-to-yellow 
scale. 



index increased mapping efficiency of A-genome reads to the D- 
genome reference sequence. With the SNP -tolerant mapping of 
GSNAP, more than 77% of A-genome reads mapped, reflecting 
a mapping improvement of approximately 15% compared to mapping 
without the SNP-index (Table 2, Supporting Information, Figure SI). 
D-genome mapping was unaffected (~95%). The error rate of cate- 
gorization of WGS reads was less than 2%, as estimated by categoriz- 
ing the diploid reads and looking for incorrectly categorized reads. 
Although this error rate is slightly higher than that estimated for the 
original PolyCat index (Page et al. 2013), it is still an acceptable rate 
considering the increased fraction of reads that overlap a SNP between 
genomes (~70%, up from <50%), and considering WGS reads map- 
ping to less conserved intergenic regions. 

Within the diploid index, genes had a median intergenomic SNP 
per base rate of 2.2% (range, 0-16.1%). Notably, there were 593 genes 
that had no unambiguous homoeo-SNPs between diploid A-genomes 
and D-genomes (Figure S2). Of these, 215 genes had one or more 
allele-SNPs {i.e., one diploid genome had two nucleotides, one match- 
ing the second diploid genome and the other nucleotide being novel). 
The remaining 378 genes were completely conserved across all acces- 
sions with no SNP differences. A Blast2Go enrichment analysis of 



these genes identified the following three enriched GO terms: NADH 
dehydrogenase (ubiquinone) activity (GO:0008137); NADH dehydro- 
genase (quinone) activity (GO:0050136); and NADH dehydrogenase 
activity (GO:0003954). Most of these genes were shorter than the 
average gene within the D-genome annotations (95 to 8113 bp with 
mean 810 ± 786 SD for the 378 genes vs. 89 to 51,174 bp with mean 
3249 ± 2806 SD for all 37,223 genes; Figure S2) (Paterson et al. 2012). 

In the polyploid, improved categorization of reads into its two 
separate genomes was enabled by the genomic SNP index. Using the 
SNPs from the diploids and the D-genome reference, PolyCat 
assigned more than 70% of mapped polyploid reads to the Ax- 
genomes or D T -genomes (Figure S3). For the tetraploid Maxxa, 
a greater percentage of reads were assigned to the A T - genome than 
to the D T -genome, despite the fact that categorization only occurred 
in regions shared by the two genomes. This later criteria preempted 
the larger A-genome from an A T categorization bias. The unexpect- 
edly higher categorization rate of A T reads may be partially explained 
by the fact that A x and A 2 diploids are a two-fold better approxima- 
tion of the A T -genome than D 5 is of the D T - genome. For example, 
nucleotide diversity appears to play a role in mapping efficiency 
among the diploid A-genome species. The most divergent line (A x - 
73) had the lowest mapping percentage of any of the of the A-genome 
diploid accessions. Because sequence divergence is less between the 
A-genome diploid and polyploid than the D-genome and polyploid, 
read categorization based on SNPs between the diploids would be more 
effective for the A T -genome, resulting in the observed bias. To a much 
lesser degree, the A T categorization bias may also be partially attributed 
to duplicated loci in the A T -genome mapping to a single locus in the 
D 5 reference, although these artifacts were largely avoided by the de- 
tection of duplications. 

Because of their recent common ancestry, many of the identified 
differences between the A-genome and D-genome diploids were 
retained between the A T -genomes and D T -genomes as homoeo-SNPs. 
A total of 20,828,020 homoeo-SNPs were identified between the Ax- 
genomes and D T -genomes of the allopolyploid cultivar Maxxa. The 
difference between the number of approximately 20 million SNPs in 
the polyploid and the "retained ancestral" homoeo-SNPs (~16 mil- 
lion; 75.8%) were autapomorphic SNPs that were derived after the 
divergence of the A T -genome and D T -genome from the A-genome 
and D-genome common ancestor, respectively. This portion of the 
homoeo-SNPs (5,046,151; 24.2%) was only identified between the 
genomes in the allopolyploid and not in the comparison of the diploid 
genome sequences. These unique, homoeo-SNPs were found through- 
out the genome in 34,810 of the 37,223 annotated genes. We antici- 
pate that additional polyploid autapomorphic SNPs will be identified 
as more polyploid genomes are re-sequenced. 

For all of the annotated genes in the D-genome reference, an 
alignment of A, A T , D T , and D-genomes was created, from which the 
amount of molecular evolution between the A-genomes and 
D-genomes of cotton was calculated (Table 3). The results of this 
effort concurred with our previously published work based on aligned 
EST contigs (Flagel et al. 2012), although SEs were much smaller 
because of the much larger dataset. We found slightly less divergence 
(dN and dS) between the polyploid genomes (n = 28,317) than be- 
tween the diploid genomes (n = 30,874), although the difference is not 
significant. The different totals between the diploids and polyploids 
suggested that more than 2500 genes in the polyploid did not have 
sufficient polymorphism for an appropriate estimation of molecular 
evolution. We further investigated the alignments of these genes to 
ascertain whether their close sequence similarity was the result of gene 
conversion between homoeologous genomes. Of the genes without 
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Table 2 Summary of nine diploid WGS re-sequencing libraries that were re-sequenced in this study and additional libraries 
(A1_97 # F1_1, and Maxxa) obtained from the SRA 
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dN/dS estimates, 759 were found to only have sufficient polymor- 
phisms between the tetraploid genomes (and not between the diploid 
genomes) and 3316 were found to have sufficient polymorphisms 
between the diploids (but not between the tetraploid genomes). This 
cumulative large difference between ploidy levels further suggested 
that gene conversions may play a role in reducing genetic diversity 
between genomes. However, only 106 and 42 genes were detected to 
overlap "conversion regions" in the diploids and tetraploid genome. 

Of the 2,817,991 SNPs between diploids that fell within genes, 
486,514 were inferred to be in exonic positions, including 248,599 that 
caused amino acid changes (i.e., nonsynonymous) compared to the 
reference sequence. Of these, there were 1651 genes with SNPs that 
resulted in premature stop codons in the pseudo-translation of 
A-genome transcripts, 1802 genes with premature stop codons in 
At, and 709 genes with premature stop codons in D T (Figure 2) These 
genes were not excluded in estimates of molecular evolution. None of 
these gene sets had any enriched GO terms. The low level of D T 
premature stops may simply reflect an ascertainment bias of the an- 
notated reference genome that was based on a diploid D genome. 
Many of the putative stop codons were found near the annotated 
end of the gene, suggesting that they might have only a minimal 
impact on protein function. Alternatively, their inference may reflect 
bioinformatic artifacts, such as imperfect gene annotation of the 
D -genome, or alternative stops that independently evolved in the 
A-genomes. Most of these alternative stops codons were within 10% 
of the 3' end of the gene. If one ignores the premature stop codons 
within the last 10% of the annotated genes, then 803 premature stops 
were shared between the diploid A-genomes and the A x -genome 
(Figure 2). This result was marginally less than previously reported 
(Paterson et al 2012), because we had the added power of multiple 
A-genome re-sequencing efforts. 

Other SNPs disrupted a start or stop codon. We identified 806 
genes with disrupted start codons (i.e., resulting in an amino acid 
distinct from that in the D 5 reference) in the A-genome, 703 in A T , 
and 684 in D T . These genes could have a longer or shorter coding 
sequence than as originally annotated. No GO terms were enriched in 
these gene sets. We also identified 831 genes with altered stop codons 
in the A-genome, 693 in A T , and 437 in D T , resulting in longer 
peptide sequences. Several GO terms (~20 in each genome) were 
enriched within these genes, and almost all were associated with pho- 
tosynthesis (Table SI). There were also 406 genes without a stop 



codon within the D 5 gene annotation, with the same photosynthesis 
GO terms being enriched. 

Diversity and heterozygosity 

In addition to creating an index of nucleotide differences between the 
diploid A-genomes and D-genomes, we detected unique nucleotide 
variation within and between individual accessions. Within a genome 
type (i.e., A or D), these types of SNPs are called allele-SNPs. The 
allelic genotype of each diploid and both genomes of the allopolyploid 
Maxxa were determined at all polymorphic loci. A pairwise compar- 
ison between accessions found that the D 5 diploids had extremely low 
nucleotide diversity (<1 million SNPs) between any two accessions, 
whereas a similar pairwise comparison between the Ax-genomes and 
A 2 -genomes found that accessions were more diverse (4-5 million 
SNPs within A l or A 2 ; 6-8 million SNPs between A x and A 2 ; Table 
S2). There were approximately twice as many SNPs between the A x - 
genomes and A 2 -genomes as within either of the two species. These 
results are not unexpected given the exceptionally low diversity found 
in a survey of allozyme diversity in G. raimondii (J. F. Wendel, un- 
published data) and the appreciable levels of diversity in the chosen 
accessions of G. arbor eum and G. herbaceum (Wendel et al. 1989). 

In addition to having more fixed allele-SNPs between accessions, 
the A-genome diploids were more heterozygous than the D -genome 
diploids (~13% and <1%, respectively; Table 4). In the A-genome 
diploids, heterozygous loci were approximately twice as frequent out- 
side than inside of genes. This was not surprising, given the expecta- 
tion of more intense purifying selection on coding sequences. Of 
course, these estimates of heterozygosity excluded loci that were du- 
plicated in the A-genome. Interestingly, heterozygous loci in the 
D-genome diploids were equally common in genie and nongenic 



Table 3 Amount of molecular evolution between the A and D 
genomes of cotton 
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Figure 2 Premature stop codons were found in each Gossypium ge- 
nome. (A) Premature stop codons (compared to the annotations of the 
D-reference genome) were found in the A, A T , and D T genomes. (B) 
Common genes with premature stop codons in the first 90% of the 
gene. 



regions. This genomic difference likely reflects both the exceptionally 
low genetic diversity within the D-genome and a high level of gener- 
alized inbreeding. In this respect, we note that G. raimondii has a nar- 
row natural range and presently exists as only scattered populations 
with very low effective population sizes. 

A neighbor-joining tree was constructed and rooted based on the 
known relationship between the A/F-genome clade and the D-genome 
clade (Figure 3). Many fixed allele-SNPs {i.e., not heterozygous within 
a line) could be attributed to mutations occurring along specific 
branches of the phylogeny (Table 5). The tree correctly reconstructed 
the accepted relationships of the diploids and their relationships to the 
two genomes of allopolyploid cotton (Senchina et al. 2003; Grover 
et al. 2012; Wendel et al. 2012). Specifically, and unsurprisingly, A T 
and D T were phylogenetically sister to the common ancestor of the 
[Ai + A 2 ] clade and the D 5 clade, respectively. Our results agreed with 
previous reports that the A x or A 2 diploids were approximately twice 
as good of an approximation of the A T -genome as the D 5 diploids 
were of the D x -genome (Wendel and Cronn 2003; Senchina et al. 
2003). The distance from D T to D 5 was 0.14, and the distance from 
A T to the common ancestor of A x and A 2 was 0.08. However, the 
distances from A x to A l and A 2 were 0.10 and 0.11, respectively. 
Moreover, the distance from A T to any individual A-genome diploid 
was 0.14, similar to the distance between D T and any D-genome 
diploid. Although the group of A-genome diploids provided an ap- 
proximation of A T that was two-times better than a group of D 5 
diploids did of D T , any individual A or D diploid appeared to be 
equally similar to its A x or D T counterpart. The exceptionally low 
diversity among D 5 diploids explained the fact that a group of D 5 - 
genome diploids was not significantly better for approximation of the 
D x -genome than was a single D 5 . However, multiple accessions of A 
diploids (A x and/or A 2 ) did provide a substantial (nearly two-times 



better) improvement in construction of the A T -genome pseudo- 
sequence. 

Duplications and deletions 

Duplications were detected in A-genome diploids as coverage peaked 
across the D-genome reference sequence (Figure 1). Because these 
duplications were detected relative to the D-genome diploids, they 
represent events that occurred after the split of these two clades 
5-10 million years ago (Wendel and Cronn 2003; Wendel et al. 
2012). Thus, peaks shared by all A-genome accessions represent pieces 
of the A-genome that are duplicated relative to the D-genome. These 
coverage peaks represent a mix of tandem and dispersed duplications, 
because the methodology used makes no distinction regarding geno- 
mic location of duplicated segments. There were 30,709 regions du- 
plicated in all A-genome diploids but no duplication in D-genome 
diploids. These duplicated blocks overlapped 1007 genes, with 
a minimum overlap of 50% of the gene length. Only one GO term 
was enriched among these genes: structural constituent of ribosome 
(GO: 0003735). 

In contrast to duplications, putatively deleted regions of the 
A-genome were detected with a higher degree of certainly because 
their diagnosis is based on lack of coverage rather than a quantitative 
difference in coverage (Figure 1). Some regions of the D-genome 
reference genome did not have any A-genome reads mapped to them, 
despite 40-times the WGS coverage based on the number of produced 
A-genome reads and correctly mapping D-genome reads to the same 
region. Each accession had a unique set of deleted regions, including 
genes (Figure S4). There were 25,408 regions deleted in all A-diploid 
genomes. The genomic regions included 978 annotated D-genome 
reference genes where the deleted region minimally overlapped 50% 
of the gene length. Among the genes within deleted regions, 118 GO 
terms were enriched compared to the population of GO terms within 
the annotated gene set (Paterson et al. 2012) (Table S3). Most of these 
deletion terms were associated with starch synthesis, tRNAs, or DNA 
repair mechanisms. Three hundred seventy-eight genes were com- 
pletely deleted in the A-genome diploids relative to the D-genome 
diploids, meaning that the genie region was spanned by a single de- 
letion block. 



Table 4 Number of heterozygous loci in each accession, along 
with the percentage of total observable loci that were heterozygous 
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Polyploid conversion events 

We used two different methods to detect conversion events in the 
allopolyploid genome of Acala Maxxa (G. hirsutum). The first method 
identified historical nonreciprocal homoeologous recombination 
events (NRHR) at individual loci based on homoeo-SNPs within 
a polyploid genome. Earlier analyses in cotton used this method to 
detect conversion events (or NRHR events) based on comparative 
analysis of assembled EST sequences from diploid and allopolyploid 
cotton (Salmon et al. 2010; Flagel et al. 2012). The second method 
identified converted regions based on coverage patterns in the Ax- 
genomes and D T -genomes relative to their respective diploid relatives. 
Here, we consider the NRHR events as "conversion" events regardless 
of whether they occur in coding or in noncoding sequences (sensu 
amplo of gene conversion). 

Based on homoeo-SNPs (first method), 1,748,889 conserved SNPs 
in 29,576 genes in the diploid genomes suggested an A T -biased allele 
conversion (a D T nucleotide converted to the A T nucleotide). In con- 
trast, a total of 361,795 SNPs in 12,346 genes suggested a D T -biased 
conversion. These data suggest a nearly five-fold bias based on 
homoeo-SNPs and a two-and-one-half-fold bias based on genes in 
favor of A T -biased conversion, in stark contrast to the two-fold bias in 
favor of D T -biased conversion reported previously (Paterson et al. 
2012). 

Based on coverage/deletion information (second method), con- 
version events were found in both directions across 882 Kbp of the 
D-genome reference sequence (Figure 1). These events ranged from 1 
to 5470 bp, with a median of 337 bp (Figure S5). Two hundred fifty- 
nine regions suggested an A T -biased conversion. These regions 
spanned 275 Kbp and overlapped the coding sequence of 19 genes. 
They also included 12,696 putative homoeo-SNPs (based on the dip- 
loids), none of which was detected within the tetraploid. However, 
1213 regions showed a D T -biased conversion. These regions spanned 
607 Kbp and overlapped the coding sequence of 94 genes. They in- 
cluded 21,142 putative homoeo-SNPs, of which only three were also 
detected within the tetraploid. The genes overlapped by these regions 
had no enriched GO terms and indicated a conversion bias in both 
directions, but with the D T direction most prominent, similar to that 



previously reported (Paterson et al. 2012). The events detected by the 
second method only included 1375 of the possibly conversion-related 
SNPs identified by the first method. 

DISCUSSION 

Genome resources for Gossypium 

Using the diploid re-sequencing data, we created several useful 
resources for the Gossypium genome. First, a genome-wide map of 
the SNPs between the diploid A-genomes and D-genomes of cotton 
was created. Re-sequencing multiple accessions of each diploid en- 
abled us to distinguish bases that were specific to a single accession 
from bases that are more representative of one diploid genome or 
another. It also allowed us to identify conserved genomic features 
shared by all A-genome or D-genome species and accessions. In that 
sense, the multiple accessions of each species acted as re-sequencing 
replications of the A-genome or D-genome "treatments." We have 
demonstrated that most SNPs identified between diploid genomes 
can be directly extrapolated to differences between the descendant 
allopolyploid genomes {i.e., homoeo-SNPs) because of their recent 
common ancestor. In addition, we also identified several million 
homoeo-SNPs that were unique to the Maxxa allopolyploid genome. 
These documented SNPs can be used for genome identification of 
individual sequence reads (Udall 2006; Flagel et al. 2012) or the de- 
velopment of genotyping assays (Van Deynze et al. 2009; Byers et al. 
2012). With an index of homoeo-SNPs, read-mapping efficiency was 
significantly improved and future false-positive allele-SNPs can be 
filtered out of marker sets, resulting in more reliable allele-SNP as- 
sessment. In addition to the homoeo-SNPs, we also identified allele- 
SNPs in the diploid cotton accessions. These SNP sets are available as 
Gbrowse tracks at CottonGen (http://www.cottongen.org) and as gff 
files at the Udall laboratory web site (http://udall-lab.byu.edu). 

A second resource is the set of alignments of gene and protein 
sequences of the A-genomes, A T -genomes, and D T -genomes to ac- 
company the previously published G. raimondii annotations (Paterson 
et al. 2012). These alignments were used to identify SNPs and to 
further refine our understanding of the molecular evolutionary 
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Table 5 SNPs attributable to specific areas of the phylogeny 



Genome SNPs Deletions 

All A 5,544,440 25,408 

At 1 ,024,299 3809 

A 2 1,152,825 2941 

A T 1 ,472,900 5247 

All D 14,601,331 0 

Dj 3,563,979 4518 



As shown in Figure 3. Because of possible conversion events, it is not possible to 
determine how many SNPs were shared by all A or D diploids. 

differences between genomes. Because we have made this a public 
resource, any researcher investigating cotton now has homoeo-SNPs 
and allele-SNPs information for any target gene already identified. 
This simple, yet tedious, task has been a common obstacle of genetic 
research in polyploid cotton. 

A third resource, and one that we suggest will be a fruitful topic for 
further investigation, is the description of putative duplications and 
deletions that distinguish the A-genomes and D-genomes and, hence, 
originated subsequent to their divergence from a common ancestor. 
These localized structural variations offer a rich source of sequences 
to mine for possible functional consequences, and to further our 
understanding of the mechanisms of copy number variation during 
genome evolution in plants. 

Through our read-mapping efforts, we noticed that the limited 
number and the stochastic distribution of homoeo-SNPs could have 
implications for de novo genome assembly of polyploid cotton. Al- 
though 70% of the reads from allopolyploid cotton could be assigned 
to one of its two co-resident genomes, 30% of the reads that mapped 
to the D -genome reference did not overlap a homoeo-SNP and, hence, 
could not be categorized. Using an arbitrary length of 1000 bp, we 
found 47,399 unique loci where sequence reads of the A T -genome and 
D T -genome were indistinguishable when compared to each other and 
to the reference genome. Assuming sequence read lengths less than 
500 bp, these regions would likely co-assemble during a de novo 
whole-genome shotgun assembly with current read lengths. Conse- 
quently, co- assembled segments will create unique challenges of graph 
structure bifurcation (or higher branching) during the contig con- 
struction steps of de novo assembly. Part of this challenge could be 
addressed by generating reads with a greater likelihood of overlapping 
homoeo-SNPs, i.e., longer reads. Present data, however, suggest that 
de novo assembly of the allopolyploid cotton genome would not be 
successful if based on contemporary read lengths. 

Insights into the genome biology of Gossypium 

One of the intriguing results of this study is the insight it provides into 
the origin and frequency of indels during A-genome and D-genome 
divergence. Because of the lack of an outgroup sequence, none of the 
"duplications" or "deletions" described here is polarized, so their du- 
plicate or deleted status is only relative to the single D-genome refer- 
ence. Moreover, the methods used do not yield insights into the 
mechanistic underpinnings of the indels, which may conceivably en- 
tail a full spectrum of deletional mechanisms and processes of tandem 
and dispersed duplication. 

Notwithstanding, the present study does reveal the scope and scale 
of the indel generating process during 5-10 million years of diploid 
evolution. Additionally intriguing are the genomic distributions of the 
duplicated and deleted regions. For example, chromosome 13 is no- 
table for its high frequency of duplications, containing one- sixth 
(2850/17,102) of the total number of conserved duplications in the 



A-genome, yet only 2.9% (174/6072) of the deletions. Given that 
chromosome 13 comprises a mere 7.8% of the G. raimondii reference 
sequence (Paterson et al. 2012), the suggestion arises that there has 
been exceptional expansion and/or contraction of this chromosome 
during the evolution of the two-fold size difference that distinguishes 
the A-genomes and D-genomes. Although some of this difference 
certainly reflects the expansion of transposable elements in the 
A-genome or possible contraction in the D-genome (Hawkins et al. 
2009), it is unclear what genomic features have allowed more numer- 
ous rearrangements in chromosome 13 than within other chromo- 
somes in the genome. 

In addition to this broad-scale view of the contributions of 
duplications and deletions to Gossypium genome evolution, the data 
presented here offer a rich database that can be mined for potentially 
significant gene duplication and deletion. For example, gene loss has 
been associated with polyploidization (Shaked et al. 2001; Ozkan 2003; 
Han et al. 2005; Tate et al. 2009), but the deletions we have described 
in the A-genome occurred before polyploidization and include parts 
of approximately 1300 genes per accession. If these A-genome acces- 
sions were used as a "parental genome reference" for investigations of 
polyploidy, then the deletions common to the Ai-genomes, A 2 - 
genomes, and ancestral A x -genomes would be confounded with any 
putative deletions that occurred as a result of polyploidization. Thus, 
this initial database of duplications and deletions will be a useful re- 
search tool for investigations of the evolution of the Gossypium 
genome. 

We observed that several genes involved in starch synthesis were 
deleted in the A-genome diploids, including seven genes with 1,4- 
alpha-glucan branching enzyme activity (Table S3). It is tempting to 
speculate that these deletions increased the amount of glucose avail- 
able in the A-genome diploid for cellulose synthesis and thereby 
played a role in the increased length of mature A-genome cotton 
fibers. Previous studies have documented altered carbon partitioning 
(Yong-Ling Ruan 1998) and altered starch accumulation (Chaohua 
et al. 2005) in fiberless G. hirsutum mutants. The deletion of starch 
genes in the A-genome may have been associated with the opposite 
effect, resulting in more carbon being allocated to cellulose production 
and less to starch production. We caution that many of the deleted 
genes are members of gene families and the remaining paralogs may 
partially or fully compensate for their deletion in the A-genome dip- 
loids. Nevertheless, the deleted genes discovered in this study offer 
interesting avenues of future research of gene duplication and func- 
tional compensation. 

Among genes with altered stop codons, we detected an enriched 
number of genes having photosynthesis-related functions. It appears 
unlikely that the altered stop codons are attributable to horizontal 
transfer of chloroplast genes to the nucleus because only four have 
high similarity to chloroplast genes. Although a biological explanation 
for this enrichment remained a mystery, it was likely that a portion of 
the enrichment of photosynthesis GO terms was an artifact of the 
gene annotation process. For example, the actual stop codon location 
may have been ambiguous because the original annotation of these 
genes actually had no stop codon. Perhaps, the initial gene annotation 
effort was simply unable to identify the full coding sequence and 
subsequent updates will include corrections to the original annota- 
tions. Regardless of the initial annotation, the enrichment of 
photosynthesis GO terms in genes with altered stop codons was 
interesting, but it was not attributable to differences between the 
A-genomes, D-genomes, A T -genomes, and D T -genomes. 

We also identified homogenized genome regions from conversion 
events between the two homoeologous genomes of G. hirsutum. Gene 
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conversion has been denned as the nonreciprocal transfer of genetic 
information between homologous sequences, leading to homogeniza- 
tion during meiotic or mitotic recombination (Szostak et al. 1983; 
Chen et al. 2007; Hsu et al. 2010; Jacquemin et al 2011). Unlike 
analyses of pairs of genes (Drouin 2002; Mondragon-Palomino and 
Gaut 2005; Xu et al 2008), whole-genome analysis of gene conversion 
has been pioneered in rice (Xu et al 2008; Wang et al 2009; Jacque- 
min et al 2011), a sequenced diploid genome with many closely re- 
lated species also with sequenced genomes. By comparison of the 
diploid re- sequencing data to the publicly available WGS data of 
Maxxa Acala, we were able to identify conversion events between 
homoeologous sequences within the polyploid Gossypium genome 
using two different methods. The two methods of detecting homoe- 
ologous conversion events resulted in different directional biases. 

Using the first method, we had previously used SNPs to 
estimate that up to 5% of the polyploid transcriptome had 
experienced "homoeologous gene conversion" (Salmon et al 
2010; Flagel et al 2012). In both previous studies, identification 
of autapomorphic SNPs was not possible because of limited dip- 
loid sequencing data. Based on our current data, the presence of 
autapomorphic SNPs (and a liberal method of identification) 
appeared to have caused an overestimate in the amount of homoe- 
ologous conversion in genie regions. Thus, the genome sequence 
of a definitive outgroup is needed to unambiguously identify 
regions of conversion using SNP information alone. One dimen- 
sion of the conversion events and the multi- alignment resource for 
all genes is the identification of loci where one of the two allo- 
polyploid genomes has "overwritten" the other via a mechanism of 
reciprocal or nonreciprocal "gene conversion." At present, the 
functional consequences of these observations remain unexplored, 
but it is intriguing to ask whether these conversion events are 
functionally insignificant (which might, for example, be the case 
when only synonymous sites are involved) or if, instead, specific 
genes or regulatory sequences have been selectively "doubled" or 
"eliminated" by this unusual intergenomic aspect of allopolyploid 
speciation. 

This first method contains an inherent bias in favor of A x -biased 
conversion because of the greater genetic distance between D T and D 5 
compared to the distance between A T and A x or A 2 . This bias, how- 
ever, should only be approximately 50% based on our understanding 
of the genetic distances between A and A T vs. D and D T (the latter is 
50% greater) (Flagel et al 2012). In addition, the genotype pattern 
indicative of a conversion is indistinguishable from that caused by an 
autapomorphic SNP in the diploid species. For example, suppose an 
autapomorphic SNP in the A-genome ancestor of A x and A 2 (not 
shared with A T ) changes a C to a T at a given locus. Consequently, 
the D-genome diploids, D T -genome, and A T -genome would all have 
a C, whereas the A-genome diploids would have a T. So, the A T 
nucleotide would appear the same as D and D T , suggesting a D T - 
biased allele conversion event, even though they simply shared the 
ancestral allele. These confounding autapomorphic SNPs would have 
occurred after the divergence of A T but before the A x -A 2 split, sug- 
gesting a D T -biased conversion. However, an A T -biased conversion 
would be suggested by an autapomorphic SNP occurring after the 
divergence of D T , but before the most recent common ancestor of 
the D 5 diploids. Because the A-genome diploid is approximately 
a two-fold better approximation of the actual progenitor diploid than 
is the D-genome diploid (Wendel et al 2012), these branch lengths are 
different (Figure 3). In fact, the phylogeny showed a distance of only 
0.00447 for the branch corresponding to an autapomorphic SNP 
shared by all A diploids but not by A x . However, the equivalent 



branch in the D-genome clade has an autapomorphic SNP distance 
of 0.05385. These numbers suggest that, in the absence of any actual 
conversion events, there should be more than 12-times (0.05385/ 
0.00447) as many SNPs that look like A T -biased conversion — because 
they are shared by A diploids but not by At — as SNPs that look like D x - 
biased conversion. The difference between the 12-fold expected value 
for A T -biased conversions as visualized by the branch lengths to the 
five-fold observed value could be explained by a bias toward D x -biased 
conversion events, as reported elsewhere and as detected by the second 
coverage-based method. Thus, we consider the conversion events 
detected by the SNP-based method (method 1) to be inaccurate based 
on autapomorphic SNPs, but we consider the conversion events of the 
coverage method (method 2) to be a conservative, yet relatively accurate, 
assessment of conversion between the polyploid G. hirsutum genomes. 

The second method of conversion detection used deletion and 
coverage information to detect many separate events, and the 
direction of bias agreed with previous reports (Paterson et al 2012). 
This method was very conservative and may represent a minimum 
amount of conversion events in the polyploid genome because of the 
uncertainty of the actual endpoints of conversion and the additional 
amounts of conversion suggested between homoeologous gene copies 
in the dN/dS analysis. The conversion events resulted in a loss of 
genomic diversity between the A T - and D T -genomes. Parts of at least 
113 genes were included in conversion events between homoeologous 
chromosomes. Other investigations of genome evolution in rice have 
uncovered convergent evolution of ancient paralogs on Chrll and 
Chrl2 (~2.1 Mb) mediated by gene conversion including up to 180 
genes (Jacquemin et al 2011). The conversion events we have de- 
scribed were more recent (after polyploidization 1-2 Ma) and our 
inference space was limited to a single species of G. hirsutum. It will 
be interesting to see if other polyploid Gossypium genomes also have 
the same conversion events in their genomes, and to estimate the rate 
of gene conversion between homoeologous genomes. 

Whole-genome re-sequencing of diploid Gossypium has identified 
insights into the genome evolution of cotton. These insights proved to 
be useful for characterization of the G. hirsutum genome via publicly 
available re-sequencing data. Additional de novo and re-sequencing 
efforts of polyploid Gossypium will continue to add to our under- 
standing of the cotton genome, thereby enhancing our ability to ma- 
nipulate the fiber and agronomic characteristics of cotton. 
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